Skip to content

[SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode#55934

Open
gengliangwang wants to merge 3 commits into
apache:masterfrom
gengliangwang:SPARK-56909-cast-int-long
Open

[SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode#55934
gengliangwang wants to merge 3 commits into
apache:masterfrom
gengliangwang:SPARK-56909-cast-int-long

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

@gengliangwang gengliangwang commented May 17, 2026

What changes were proposed in this pull request?

In Cast.scala, the ANSI codegen for narrowing casts to int / long previously emitted a 5-line inline body per call site (bounds check + cast + throw). After this PR it emits a single static call into the existing LongExactNumeric / FloatExactNumeric / DoubleExactNumeric objects in numerics.scala, which already implement the same overflow check + castingCauseOverflowError throw that this codegen needs.

The rewrite uses the same getClass.getCanonicalName.stripSuffix("$") pattern as the adjacent MathUtils / IntervalMathUtils calls. The Scala compiler emits public static forwarders on the companion class of top-level objects, so generated Java code can call e.g. org.apache.spark.sql.types.LongExactNumeric.toInt(v) directly.

Touched Cast.scala helpers:

  • castIntegralTypeToIntegralTypeExactCode: the int target branch now emits LongExactNumeric.toInt($c) (byte/short narrowing stays inline; refactored in SPARK-56910).
  • castFractionToIntegralTypeCode: the int / long target branches now emit FloatExactNumeric / DoubleExactNumeric toInt / toLong (byte/short narrowing stays inline; refactored in SPARK-56910).

Primitive widening branches and the non-ANSI paths are untouched.

Why are the changes needed?

Part of SPARK-56908 (umbrella). The narrow-cast ANSI branches in Cast.doGenCode are some of the longer inline bodies still emitted per call site. Multiplied across the many cast paths in a TPC-DS plan, they contribute meaningfully to the generated source size and Janino compile time, and push whole-stage methods closer to the 64KB JVM method limit.

Compared to v1 of this PR (which added a new CastUtils.java with longToIntExact / floatToIntExact / etc.), this version calls the existing LongExactNumeric.toInt / FloatExactNumeric.toInt / toLong / DoubleExactNumeric.toInt / toLong directly. Those are public static forwarders on top-level Scala objects that already implement the same castingCauseOverflowError(v, FROM, TO) throw — no new helper class needed. (Applying the same lesson cloud-fan called out on #55938.)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \
  *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite *ExpressionClassIdentitySuite"

307/307 pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.x

@gengliangwang
Copy link
Copy Markdown
Member Author


Stack overview (SPARK-56908 umbrella)

This PR is part of a stack of 8 PRs against SPARK-56908. Order:

  1. [SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode #55934 — [SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode (this stack base)
  2. [SPARK-56910][SQL] Simplify Cast to byte/short codegen under ANSI mode #55935 — [SPARK-56910][SQL] Simplify Cast to byte/short codegen under ANSI mode
  3. [SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode #55936 — [SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode
  4. [SPARK-56912][SQL] Simplify Cast to boolean codegen under ANSI mode #55937 — [SPARK-56912][SQL] Simplify Cast to boolean codegen under ANSI mode
  5. [SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode #55939 — [SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode (depends on [SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode #55936)
  6. [SPARK-56913][SQL] Simplify BinaryArithmetic byte/short codegen under ANSI mode #55938 — [SPARK-56913][SQL] Simplify BinaryArithmetic byte/short codegen under ANSI mode (independent)
  7. [SPARK-56915][SQL] Simplify MakeDate/MakeInterval codegen under ANSI mode #55940 — [SPARK-56915][SQL] Simplify MakeDate/MakeInterval codegen under ANSI mode (independent)
  8. [SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode #55941 — [SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode (independent)

PRs 1-4 are linearly stacked on each other (each branch is based on the previous one). PR 5 (decimal arithmetic) is stacked on top of PR 3 (cast decimal) since it uses CastUtils.changePrecisionExact. PRs 6, 7, 8 branch off master independently.

### What changes were proposed in this pull request?

Introduce `CastUtils.java` and use it from `Cast.scala` to collapse the
multi-line ANSI overflow-check codegen for casts that target `int` and
`long` into one-line static-method calls. Source and target `DataType`
constants used in the overflow error message live as `private static
final` fields on the helper class, so the happy path performs no per-row
`references[]` lookups.

Helpers added:
* `longToIntExact(long)` for narrowing `long -> int`.
* `floatToIntExact(float)`, `doubleToIntExact(double)` for fractional
  -> int.
* `floatToLongExact(float)`, `doubleToLongExact(double)` for fractional
  -> long.

`Cast.scala` changes:
* `castIntegralTypeToIntegralTypeExactCode` and
  `castFractionToIntegralTypeCode` dispatch on the target type: `int`
  (and `long` for the fraction case) emit a `CastUtils.<...>Exact` call;
  byte/short targets keep the inline body (refactored in SPARK-56910).
* Eval paths for `castToInt` add ANSI `LongType` / `FloatType` /
  `DoubleType` cases, and `castToLong` adds `FloatType` / `DoubleType`
  cases, both delegating to the new helpers.

### Why are the changes needed?

Part of SPARK-56908. The current ANSI cast codegen emits 5-line inline
overflow blocks per call site. Multiplied across the many cast paths in
a TPC-DS plan, this contributes meaningfully to the generated source size
and to Janino compile time, and pushes whole-stage methods closer to the
64KB JVM method limit.

### Does this PR introduce _any_ user-facing change?

No. The compiled behavior is identical; only the emitted Java source
text changes.

### How was this patch tested?

`build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite
*CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite
*ExpressionClassIdentitySuite"` — 312/312 pass.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.x
@gengliangwang gengliangwang force-pushed the SPARK-56909-cast-int-long branch from 8eba972 to 7209218 Compare May 18, 2026 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant